Frequent changing pattern extraction device

ABSTRACT

A frequent changing pattern extraction device which extracts a frequent changing pattern from an ever-changing network structure includes: a conversion unit which converts each of a plurality of graph sequences into an operator sequence by expressing changes, from a first graph included in the graph sequence to a second graph which is temporally adjacent to the first graph, using operators indicating operations necessary to transform the first graph into the second graph, each graph sequence including a plurality of graphs that show temporal changes in the graphs and each of the graphs including a vertex corresponding to a data piece and an edge corresponding to a link between data pieces; and an extraction unit which extracts an operator subsequence that appears at least a predetermined number of times in the plurality of operator sequences corresponding to the plurality of graph sequences, based on the anti-monotonicity used in the Apriori algorithm.

TECHNICAL FIELD

The present invention relates to data mining techniques used forgraph-based data, and particularly to a frequent changing patternextraction device which extracts, from a sequence of graphs havingtemporal changes, a pattern of change that frequently appears in thesequence.

BACKGROUND ART

In recent years, there has been an increase in studies on data miningwhich is used to discover useful or interesting patterns as knowledgefrom a massive amount of data. The usefulness varies from one person toanother, and is thus difficult to define. However, in general, knowledgeto explain many cases is considered to be useful (see Non-PatentReference 6, for example). Ever since the Apriori algorithm was proposedin 1994, whereby frequent item sets are enumerated from data includingplural item sets (see Non-Patent Reference 1, for example), frequentpattern enumeration algorithms have been proposed for various kinds ofdata structures. Recently, high-speed methods of enumerating frequentsubstructure patterns that appear in complex structures such as graphshave been proposed (see Non-Patent Reference 9, for example).

FIGS. 14 to 16 are diagrams for explaining one example of a method ofenumerating frequent item sets using the Apriori algorithm. By using theApriori algorithm, data combinations frequently appearing in plural datasets can be extracted at high speed, for example.

Consideration is given to the case where the data combinations whichappear at least twice are to be extracted from four data sets, which are{R, Y, P}, {B, Y, G}, {R, B, Y, G}, and {B, G} as shown in FIG. 14.These data sets include five kinds of data pieces which are R, B, Y, P,and G. Thus, as the data combinations, there are: five kinds of datacombinations each including one piece of data (=₅C₁); ten kinds of datacombinations each including two pieces of data (=₅C₂); ten kinds of datacombinations each including three pieces of data (=₅C₃); five kinds ofdata combinations each including four pieces of data (=₅C₄); and onekind of data combination including five pieces of data (=₅C₅). In total,there are 31 kinds of data combinations.

FIG. 15 is a diagram showing a search tree in which a vertex correspondsto a data combination. A vertex label shown in this diagram denotes thedata combination as well as the number of data sets that include thepresent combination. For example, there are two data sets in which thedata combination {R, Y} appears (namely, {R, Y, P} and {R, B, Y, G}).Thus, “RY₂” is described as the vertex label. In the diagram, the nearerthe root, the fewer the number of data sets. Also, the nearer theleaves, the more the number of data sets. Regarding the verticesconnected with edges, the number of data pieces included in the datacombination of a child vertex is larger by one than the number of datapieces included in the data combination of a parent vertex. In the casewhere a search is to be performed in the search tree according to anexhaustive search algorithm, the number of appearances needs to becalculated for each of 31 data combinations.

FIG. 16 is a diagram for explaining a method of extracting a datacombination which appears at least twice, according to the Apriorialgorithm. First, the above-mentioned numbers of appearances arecalculated for the combinations each including only one piece of data(namely, {R}, {B}, {Y}, {P}, and {G}). The results are twice, threetimes, three times, once, and three times, respectively. Since thenumber of appearances of the data combination {P} is one, each number ofappearances of the other data combinations including the datacombination {P} is fewer than twice. On account of this, the search doesnot need to be performed for the other data combinations including thedata combination {P} (i.e., for descendant vertices of the vertex withthe label P₁ in the search tree). Accordingly, the calculation of thenumbers of appearances is terminated. Similarly, out of the datacombinations each including two pieces of data, the data combinations{R, B} and {R, G} appear once. Therefore, the calculation of the numbersof appearances for the other data combinations including these datacombinations is terminated as well. Thus, the data combinations whichappear at least twice can be obtained at high speed. As described sofar, according to the Apriori algorithm, a search for a pattern which isnot expected to reach a goal is terminated and therefore a search for afrequent pattern can be made at high speed.

Targets of the graph mining have been mainly graphs which do not changeover time.

-   Non-Patent Reference 1: R. Agrawal, R. Srikant, Fast Algorithms for    Mining Association Rules in Large Databases, Proceedings of Very    Large Data Base, pp. 487-499, 1994.-   Non-Patent Reference 2: A. Inokuchi et. al., An Apriori-based    Algorithm for Mining Frequent Substructures from Graph Data,    Proceedings of European Conference on Principles of Data Mining and    Knowledge Discovery, pp. 13-23, 2000.-   Non-Patent Reference 3: Inokuchi, T. Washio, Y. Nishimura, & H.    Motoda, A Fast Algorithm for Mining Frequent Connected Subgraphs,    IBM Research Report, RT0448 February, 2002.-   Non-Patent Reference 4: M. Kuramochi & G. Karypis, Frequent Subgraph    Discovery, Proceedings of International Conference on Data Mining,    pp. 313-320, 2001.-   Non-Patent Reference 5: Kuramochi & G. Karypis, Finding Frequent    Patterns in a Large Sparse Graph, Proceedings of SIAM Data Mining,    2004.-   Non-Patent Reference 6: H. Motoda, Fascinated by Explicit    Understanding, Journal of the Japanese Society for Artificial    Intelligence, pp. 615-625, 1999.-   Non-Patent Reference 7: S. Nijssen & J. Kok, A Quickstart in    Frequent Structure Mining can Make a Difference, Proceedings of    International Conference on Knowledge Discovery and Data Mining, pp.    647-652, 2004.-   Non-Patent Reference 8: J. Pei, et. al., PrefixSpan: Mining    Sequential Patterns by Prefix-Projected Growth, Proceedings of    International Conference on Data Engineering, pp. 215-224, 2001.-   Non-Patent Reference 9: T. Washio & H. Motoda, State of the Art of    Graph-based Data Mining, SIGKDD Explorations, Vol. 5, No. 1, pp.    59-68, 2003.-   Non-Patent Reference 10: X. Yan & J. Han, gSpan: Graph-Based    Substructure Pattern Mining, Proceedings of International Conference    on Data Mining, pp. 721-724, 2002.

SUMMARY OF THE INVENTION Problems that Invention is to Solve

For example, in a human relation network represented by a graph as oneexpression, a person who is going to be a hub (a core or center) in thefuture does not act as a hub person since first participating in thenetwork. This person is moving to a position to be a hub while thenetwork structure is changing over time. Considering an entire graph asone community in the human relation network, the participation andwithdrawal of persons respectively correspond to an increase anddecrease in the number of vertices, and the changes caused by theresulting relations correspond to an increase or decrease in the numberof edges. Similarly, a network structure configured by webpages changesits structure according to an increase or decrease in the number ofwebpages and hyperlinks over the course of a developmental process.Also, a gene network changes its network structure over the course of anevolutional process including acquiring new genes, deleting genes, andmutating genes. A discussion thread can be considered as growth in atree or directed acyclic graph where a new message causes a new vertexand a references to a previous comment causes an edge. Studies onchanges in network structures as described above are believed to becomeone of the important subjects in the future.

According to a conventional method of enumerating substructure patternsat high speed, however, frequent changing patterns cannot be extractedfrom a network structure that changes from moment to moment because theprocessing targets of the conventional method are static datastructures.

The present invention is conceived in view of the stated problem, andhas an object to provide a frequent changing pattern extraction devicewhich extracts a frequent changing pattern from a network structure thatchanges from moment to moment.

Means to Solve the Problems

In order to achieve the aforementioned object, the frequent changingpattern extraction device according to an aspect of the presentinvention is a frequent changing pattern extraction device including: aconversion unit which converts a graph sequence into an operatorsequence by expressing changes, from a first graph included in the graphsequence to a second graph which is temporally adjacent to the firstgraph, using operators indicating operations necessary to transform thefirst graph into the second graph, the graph sequence including aplurality of graphs that show temporal changes in the graphs, and eachof the graphs including a vertex corresponding to a data piece and anedge corresponding to a link between data pieces; and an extraction unitwhich extracts an operator subsequence that appears at least apredetermined number of times in the operator sequence, based onanti-monotonicity used in the Apriori algorithm.

To be more specific, the operations indicated by the operators includeat least one of a vertex insertion, a vertex deletion, a vertexrelabeling, an edge insertion, an edge deletion, and an edge relabeling.

With this configuration, changes in the graphs are expressed using theoperators. Thus, the changes in the graphs (i.e., in the networkstructure) can be represented by the operator sequence. Based on theanti-monotonicity used in the Apriori algorithm, a frequent operatorsubsequence can be extracted. Since the operator sequence represents thechanges in the graphs, a frequent pattern of change in the graphs can beextracted.

It is preferable that the stated frequent changing pattern extractiondevice further includes a sequence-for-union-graph generation unit whichgenerates an operator sequence corresponding to a union graph obtainedby removing a vertex that is not connected to another vertex from agraph configured by a union of vertices and a union of edges of theplurality of graphs included in the graph sequence, wherein theextraction unit extracts an operator subsequence that appears at least apredetermined number of times in the operator sequence generated by thesequence-for-union-graph generation unit, based on the anti-monotonicityused in the Apriori algorithm.

A graph which is not connected to a union graph is considered difficultfor people to interpret. On account of this, a graph which is notconnected to a union graph is removed, so that only the operatorsequences included in the union graph become the targets in theprocessing. As a result, only operator subsequences (the patterns ofchange in the graphs) which are useful to people can be accordinglyextracted. Moreover, the number of operator sequences to be evaluated bythe extraction unit can be reduced, and therefore the processing can beperformed at high speed.

Also, it is preferable that the stated frequent changing patternextraction device further includes an order changing unit which changesan order in which the operators included in the operator sequenceconverted by the conversion unit are arranged, so that the temporalchanges in the graphs expressed by a resulting operator sequence arerepresented by vertices that increase in number over time, wherein theextraction unit extracts an operator subsequence that appears at least apredetermined number of times in the operator sequence obtained as aresult of the order change executed by the order changing unit, based onthe anti-monotonicity used in the Apriori algorithm.

By changing the order in which the operators are applied, it becomeseasier to apply the anti-monotonicity used in the Apriori algorithm.

It should be noted that the present invention can be implemented notonly as the frequent changing pattern extraction device including thecharacteristic units as described above, but also as: a frequentchanging pattern extraction method having, as steps, the characteristicunits included in the frequent changing pattern extraction device; and aprogram causing a computer to execute the characteristic steps includedin the frequent changing pattern extraction method. In addition, itshould be understood that such a program can be distributed via arecording medium such as a CD-ROM (Compact Disc-Read Only Memory) or acommunication network such as the Internet.

Effects of the Invention

The present invention can provide a frequent changing pattern extractiondevice which extracts a frequent changing pattern from a networkstructure that changes from moment to moment.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram showing a functional configuration of afrequent changing pattern extraction device in an embodiment of thepresent invention.

FIG. 2 is a diagram showing an example of a graph sequence.

FIG. 3 is a diagram showing an example of an unreadable pattern.

FIG. 4 is a diagram showing a part of an input sequence.

FIG. 5 is a diagram showing an example of a sequence expressed usinggraph transformation operators.

FIG. 6 is a diagram showing an example of an output pattern.

FIG. 7 is a diagram showing an example of graph sequence representationof Table 3.

FIG. 8 is a diagram showing an example of graph sequence representationof Table 4.

FIG. 9 is a diagram showing an example of a search tree.

FIG. 10 is a diagram showing pseudo-codes of a method according to thebreadth-first search algorithm.

FIG. 11 is a diagram showing variations in the calculation time withrespect to variations in |DB|.

FIG. 12 is a diagram showing variations in the calculation time withrespect to variations in p′_(i).

FIG. 13 is a diagram showing variations in the calculation time withrespect to variations in σ′.

FIG. 14 is a diagram showing an example of data sets.

FIG. 15 is a diagram showing a search tree and a result of a search madethrough the search tree according to an exhaustive search algorithm.

FIG. 16 is a diagram showing a result of a search made according to theApriori algorithm.

NUMERICAL REFERENCES

-   10 Changing graph sequence storage unit-   12 Conversion unit-   14 Sequence-for-union-graph generation unit-   16 Order changing unit-   18 Extraction unit-   20 Subsequence candidate generation unit-   22 Appearance frequency calculation unit-   33, 34, 35 Vertex-   100 Frequent changing pattern extraction device

DETAILED DESCRIPTION OF THE INVENTION

The present invention proposes a method of efficiently enumeratingfrequent changing patterns buried in data of graph sequences that changeover time, based on a graph mining technique.

A graph change, mentioned as a target of the present invention, refersto a structural change caused as a result of an increase or decrease inthe number of vertices or edges. Information travelling across a network(namely, a graph) and a distance between vertices are also importantelements as causes of the structural changes. However, in order tosimplify the problem, the present invention is discussed, with attentionbeing focused only on graph structures.

The following is a description of a frequent changing pattern extractiondevice according to an embodiment of the present invention, withreference to the drawings.

FIG. 1 is a block diagram showing a functional configuration of afrequent changing pattern extraction device according to the embodimentof the present invention.

A frequent changing pattern extraction device 100 is a device whichextracts a frequent changing pattern from a graph sequence that changesover time. The frequent changing pattern extraction device 100 includesa changing graph sequence storage unit 10, a conversion unit 12, asequence-for-union-graph generation unit 14, an order changing unit 16,an extraction unit 18, a subsequence candidate generation unit 20, andan appearance frequency calculation unit 22. The frequent changingpattern extraction device 100 is configured by a computer, and thechanging graph sequence storage unit 10 is configured by a memory of thecomputer or an external storage device such as a hard disk. Processesperformed by the other processing units are implemented by causing a CPUof the computer to execute programs. It should be noted thatintermediate results given by the processing units are stored in memoryof the computer. Also note that a processing result is displayed on anot-illustrated display device of the computer.

The changing graph sequence storage unit 10 is a storage device whichstores a plurality of graph sequences, each graph sequence including aplurality of graphs that show temporal changes in the graphs, and eachof the graphs including a vertex corresponding to a data piece and anedge corresponding to a link between data pieces.

The conversion unit 12 is a processing unit which converts each of thegraph sequences stored in the changing graph sequence storage unit 10into an operator sequence by expressing changes, from a first graphincluded in the graph sequence to a second graph which is temporallyadjacent to the first graph, using operators indicating operationsnecessary to transform the first graph into the second graph.

The sequence-for-union-graph generation unit 14 is a processing unitwhich generates, for each operator sequence of the graph sequence,another operator sequence corresponding to a union graph obtained byremoving a vertex that is not connected to another vertex from a graphconfigured by a union of vertices and a union of edges of the pluralityof graphs included in the graph sequence.

The order changing unit 16 is a processing unit which, for each operatorsequence generated by the sequence-for-union-graph generation unit 14,changes the order in which the operators included in the operatorsequence are arranged, so that the temporal changes in the graphsexpressed by the operator sequence are represented by the vertices thatincrease in number over time.

The extraction unit 18 is a processing unit which extracts an operatorsubsequence that appears at least a predetermined number of times in theplurality of operator sequences corresponding to the plurality of graphsequences, based on the anti-monotonicity used in the Apriori algorithm.The extraction unit 18 includes the subsequence candidate generationunit 20 and the appearance frequency calculation unit 22.

The subsequence candidate generation unit 20 is a processing unit whichgenerates operator subsequence candidates while increasing the number ofincluded operators by one each time.

The appearance frequency calculation unit 22 is a processing unit whichcalculates the number of times the operator subsequence candidateappears in the plurality of operator sequences.

It should be noted that the subsequence candidate generation unit 20increases the number of operators by one for only an operatorsubsequence candidate, out of the operator subsequence candidates, whosenumber of appearances calculated by the appearance frequency calculationunit 22 is the predetermined number of times or more, so as to updatethe operator subsequence candidates.

The processing performed by the frequent changing pattern extractiondevice 100 configured as described above is explained below.

<1. Problem Definition>

FIG. 2 is a diagram showing an example of a changing graph sequencestored in the changing graph sequence storage unit 10. In the diagram,g^((t)) denotes a t-th graph in the sequence and each g^((t)) is alabeled graph. An object of the present invention is to propose analgorithm for enumerating frequent changing patterns from such achanging graph sequence. In order to achieve this object, a firstproblem is how to concisely express changes in the graphs and, at thesame time, to minimize a search space by reducing the variety ofpossible expressions. In FIG. 2, g⁽¹⁾ and g⁽²) have the substructureconfigured by three vertices in common. With this being the case,holding information of all vertices and all edges for each t does notlead to a concise expression. To address this problem in the presentinvention, consideration is given to the case where a changing graphsequence is expressed using a description based on a difference betweeng^((t)) and g^((t+1)).

A second problem is to search for a pattern p with what kind ofcharacteristics, the pattern being expressed as p=<g_(s) ⁽¹⁾ . . . g_(s)^((m))>. For example, when the graph sequence is searched withoutconstraints on each graph g_(s) ^((t)), an enormous number of patternsbecome subjects of the search. Also, an output pattern may not be alwaysinterpretable. In the case where a disconnected graph is allowed asg_(s) ^((t)), for instance, a pattern as shown in FIG. 3 may beoutputted. Let the pattern shown in FIG. 3 be a network structure ofwebpages. In this case, the shown pattern is a subsequence that canexist everywhere, such as a structure where vertices B and C correspondto webpages of Washio Laboratory whereas a vertex A corresponds to awebpage of an organization in Brazil. Thus, it is highly possible forthis pattern to be extracted as a frequent pattern. However, since thereis no association between A and B, such a pattern is usually difficultto interpret and thus may fall outside people's interest. On the otherhand, under the constraint that each graph at t is connected, a patternsuch as the one shown in FIG. 2 is not searched for. However, althoughvertices 33 and 34 are not connected in each graph at t, these verticesare thought to be associated in some way with each other via a vertex35. On this account, such a pattern as shown in FIG. 2 is desired to bea search target. In terms of versatility, it is better for asearch-target pattern to have fewer constraints. As described thus far,patterns to be search targets are not obvious due to the problem asmentioned in the present invention. Accordingly, the definition ofpatterns is discussed as well.

A labeled graph g is defined as g=(V, E, L, f). Here, V represents a setof vertices and is expressed as V={v₁, v₂, . . . , v_(n)}.E={(v _(i) , v _(j))|(v _(i) ,v _(j))εV×V}  [Math. 1]In the above, E represents a set of edges. L represents a set of labels.f:(V∪E→L)  [Math. 2]Also, f is expressed as above. In the present invention, undirectedgraphs are discussed according to a proposed method. However, thepresent invention is applicable to directed graphs. Suppose here thatthe graph g and the graph g_(s) expressed as g_(s)=(V_(s), E_(s), L_(s),f) satisfy the following equations.[Math. 3]∀v _(i) εV _(s) , f(v _(i))=f(φ(v _(i))),  1.∀(v _(i) , v _(j))εE _(s) , f(v _(i) , v _(j))=f(φ(v _(i)), φ(v_(j))),  2.When such a function φ exists, g_(s) is referred to as a subgraph of gand expressed as follows.g_(s)

g  [Math. 4]A set of edges connecting from a vertex v_(i) to a vertex v_(j) iscalled a path. When a path is present between any two vertices of agraph, this graph is called a connected graph. A graph sequence isexpressed as d=<g⁽¹⁾ g⁽²⁾ . . . g^((n))>. The object of the presentinvention is to provide a method of searching for and finding a frequentsequence p=<g_(s) ⁽¹⁾ g_(s) ⁽²⁾ . . . g_(s) ^((m))> when the graphsequence d is given as an input. Here, the following expression is givenin the case where 1≦j₁<j₂ . . . <j_(m)≦n.g_(s) ⁽¹⁾

g^((j) ¹ ⁾, g_(s) ⁽²⁾

g^((j) ² ⁾, . . . , g_(s) ^((m))

g^((j) ^(m) ⁾  [Math. 5]Here, p is described as follows.p

d  [Math. 6]

EXAMPLE 1

A network of webpages has a graph structure where a vertex corresponds awebpage and an edge corresponds to a hyperlink, for example. The graphstructure changes whenever an edit is performed. For instance, g^((t))has a graph structure in the t-th phase of a certain website. Althougheach page may be considered to be unlabeled, it may also be consideredto be labeled such as “Webpage of University”, “Webpage of FinancialCompany”, or “Webpage of Manufacturing Company”. A label is setaccording to the intention of analysis, and is not specificallydesignated in the present invention.

A union graph is defined in order to discuss what kind of pattern is tobe searched for. Each vertex v_(i) of a graph has a unique ID id (v_(i))that does not change over time. In the aforementioned examples ofwebpages, URLs correspond to the unique IDs. When a set of graphsexpressed as {g₁, . . . , g_(n)} is given, Math. 7 described below isdefined by Math. 8 as follows.G=∪_(i)g_(i)  [Math. 7]V(G)=∪_(i) {id(v)|vεV(g _(i))}E(G)=∪_(i){(id(v ₁), id(v ₂))|(v ₁ , v ₂)εE(g _(i))}  [Math. 8]Here, V (g_(i)) and E (g_(i)) represent a set of vertices and a set ofedges of the graph g_(i), respectively.∪_(i)g_(i)  [Math. 9]The number of vertices in the above expression is the cardinality of theunique IDs of the vertices of {g₁, . . . , g_(n)}. According to thedefinition as described, a target pattern in the present invention canbe defined as follows. Suppose that a pattern is expressed as p=<g_(s)⁽¹⁾ g_(s) ⁽²⁾ . . . g_(s) ^((m))>.

Here, a search is made for a graph sequence p where the following isconnected.∪_(i=1, . . . , m) g _(s) ^((i))  [Math. 10]Note that the vertices included in the graph sequence p that satisfiesthis condition are “associated with each other”. Although each g_(s)^((i)) appearing in the pattern may be disconnected, any two vertices inthe pattern are associated with each other within a target phase. Hence,each output pattern is readable (i.e., interpretable), which does notviolate the aforementioned object.

Documents (see Non-Patent Reference 5, for example) have proposed theSIGRAM algorithm whereby frequent subgraphs are mined from a huge graphthat does not change over time. Although the SIGRAM algorithm proposes afrequency counting method, the FSG algorithm that is an existing graphmining method (see Non-Patent reference 4, for example) is employed asthe pattern enumeration method. In other words, the pattern enumerationmethod and the frequency counting method can be separately defined, andthe same can be said for the problem to be addressed by the presentinvention. On the account of this, the present invention focuses on apattern enumeration method and accordingly proposes an efficientenumeration method. Suppose that an input database DB is a collection ofgraph sequences d_(i) and data identifiers tid_(i), and is expressed asDB={(tid_(i), d_(i))|d_(i)=<g_(i) ⁽¹⁾ g_(i) ⁽²⁾ . . . g_(i) ^((ti))>}.For such a database, the support is defined as follows.σ(p)=|{tid _(i)|(tid _(i) , d _(i))εDB, p

d _(i) }|/|DB|  [Math. 11]A pattern having a specified support threshold σ′ or higher is referredto as a frequent pattern.

Next, a first problem of pattern enumeration is explained.

<Pattern Enumeration Problem 1 (Simple Problem)>

Suppose that a collection of graph sequences expressed as DB={(tid_(i),d_(i))|d_(i)=<g_(i) ⁽¹⁾ . . . g_(i) ^((ti))>} and σ′ are given asinputs.

In this case, the problem is to enumerate each frequent pattern pexpressed as p=<g_(s) ⁽¹⁾ . . . g_(s) ^((m))>, where the following isconnected.∪_(i)g_(s) ^((i))  [Math. 12]

Each graph g_(s) ^((t)) included in the graph sequence as a pattern isnot always connected. The simplest method as the pattern enumerationalgorithm is: to activate the frequent subgraph enumeration algorithmwhereby disconnected graphs are also outputted; to perform the existingsequential pattern mining, with each frequent subgraph being an item;and then to remove a pattern whose union graph is not connected, in thepost-processing. However, this method is inefficient because patternsthat do not satisfy the condition that the union graph of the pattern isconnected are obtained in large numbers immediately before thepost-processing.

Also, consider a method of expanding the pattern by adding an item i_(k)one at a time in the temporal order, as in the case of a conventionalsequential pattern mining (see Non-Patent Reference 8, for example).When a pattern desired to be extracted is i₁ i₂ (i₂ i₃) i₄, the patternis expanded in order as follows: i₁; i₁ i₂; i₁ i₂ (i₂); i₁ i₂ (i₂ i₃);and i₁ i₂ (i₂ i₃) i₄. A new item always has to be appended to the itemthat occurs most recently in the temporal order. However, in the casewhere an analysis target is a graph and it is known in advance that thepattern shown in FIG. 2 is one of frequent patterns, g_(s) ⁽²⁾ can begenerated by adding a darkest-shaded vertex to g_(s) ⁽¹⁾ of FIG. 2. Onthe other hand, when <g_(s) ⁽¹⁾ g_(s) ⁽²⁾> is frequent and <g_(s) ⁽¹⁾g_(s) ⁽²⁾ g_(s) ⁽³)> is infrequent, it is useless and inefficient to addthe darkest-shaded vertex. The search is performed in a state wherefrequent patterns are unknown in advance. Hence, an efficient searchmethod is necessary to achieve the aforementioned object.

Regarding relevance to the problem of the existing frequent subgraphmining, when each t_(i) of Pattern Enumeration Problem 1 is 1, this isthe same problem addressed by the algorithms of AcGM (see Non-PatentReference 3, for example), FSG (see Non-Patent Reference 4, forexample), and gSpan (see Non-Patent Reference 10, for example).Moreover, when t_(i)=1, the constraint on the union graph is canceled,and a constraint that to-be-extracted patterns are included as inducedsubgraphs in the graphs of the database is imposed, this is the sameproblem addressed by the AGM algorithm (see Non-Patent Reference 2, forexample).

<2. Graph Transformation Operators>

The conversion unit 12 holds only differences between g^((t)) andg^((t+1)) using one of the methods of determining graph edit distances,in order to express changes in the graphs. To be more specific, thedegree of similarity between two graphs is determined according to thesmallest number of times in which insertion, deletion, relabeling ofvertices and edges are applied recursively until the two graphs becomeidentical. Operators used for performing six kinds of operations shownin Table 1 are referred to as transformation operators.

TABLE 1 Graph Transformation Operators Vertex insertion OP_([vi, i, l])^((t)) g^((t)) Vertex with label l is inserted to g(t). Unique ID of theinserted vertex is i. The inserted vertex has no edges. Vertex deletionOP_([vd, i, l]) ^((t)) g^((t)) Vertex with unique ID i is deleted fromg^((t)). Only isolated vertices are targets. When deleting anon-isolated vertex, OP_([ed,) _((i, j), l]) ^((t)) is applied a fewtimes in advance. Vertex relabeling OP_([vr, i, l]) ^((t)) g^((t))Vertex label with unique ID i is relabeled to l. Edge insertionOP_([ei,) _((i, j), l]) ^((t)) g^((t)) Edge with label l is insertedbetween vertices with unique IDs i and j in g^((t)). Edge deletionOP_([ed,) _((i, j), l]) ^((t)) g^((t)) Edge between vertices with uniqueIDs i and j is deleted from g^((t)), l being the label of theto-be-deleted edge. Edge relabeling OP_([er,) _((i, j), l]) ^((t))g^((t)) Label of the edge between vertices with unique IDs i and j isrelabeled to l.Holding differences between g⁽¹⁾ and the subsequent graphs is one way.However, considering that g⁽⁰⁾ has no vertices, data including adifference between g⁽⁰⁾ and g⁽¹⁾ is held so as to process the datauniformly. Hereafter, g⁽⁰⁾ is expressed as follows.⊥  [Math. 13]Even in the case where each graph is relatively large, data can beconcisely held if the changing parts are small in number.

EXAMPLE 2

A sequence shown in FIG. 4 is considered, for instance. The sequenceshown in FIG. 4 can be expressed by a sequence of insertions anddeletions of vertices and edges as shown in FIG. 5. A numericalsuperscript assigned to each individual vertex represents the unique IDof the vertex. Here, the changes in the graphs can be expressed asfollows.

$\begin{matrix}{g^{(2)} = {{{OP}_{\lbrack{{ei},{({1,2})}, -}\rbrack}^{(1)}{OP}_{\lbrack{{vi},3,C}\rbrack}^{(1)}{OP}_{\lbrack{{vd},1,A}\rbrack}^{(1)}{OP}_{\lbrack{{ed},{({1,2})}, -}\rbrack}^{(1)}{OP}_{\lbrack{{ei},{({1,2})}, -}\rbrack}^{(0)}{OP}_{\lbrack{{vi},2,C}\rbrack}^{(0)}{OP}_{\lbrack{{vi},1,A}\rbrack}^{(0)}}\bot}} & \left\lbrack {{Math}.\mspace{14mu} 14} \right\rbrack\end{matrix}$When the data d_(i) in the database is expressed as di=<g_(i) ⁽¹⁾ g_(i)⁽²⁾ . . . g_(i) ^((n))>, this expression is referred to as the graphsequence representation.g ^((n)) =OP _([*, o) _(k) _(, l) _(k) _(]) ^((n−1)) . . . OP _([*, o) ₁_(, l) ₁ _(]) ⁽⁰⁾ OP _([*, o) ₀ _(, l) ₀ _(]) ⁽⁰⁾⊥  [Math. 15]When expressed as the above, this expression is referred to as thetransformation operator representation.

OP_([*, o) ₀ _(, l) ₀ _(]) ⁽⁰⁾ . . . OP_([*, o) _(k) _(, l) _(k) _(])^((n−1))

  [Math. 16]When expressed as the above, this expression is referred to as thetransformation operator sequence representation. Suppose that anoperator expressed as below is included in s of the transformationoperator sequence representation.OP_([*, o, l]) ^((t))  [Math. 17]

In this case, the following expression is given.OP_([*, o, l]) ^((t))εs  [Math. 18]Also, the transformation operator sequence representation correspondingto d in the graph sequence representation is described as seq (d).

Transformation Operator Sequence Representation

A sequence s′ that is generated by removing some operators from thefollowing expression is referred to as a subsequence of s.

OP_([*, o) ₀ _(, l) ₀ _(]) ⁽⁰⁾ . . . OP_([*, o) _(n−1) _(, l) _(n−1)_(]) ^((n−1))

  [Math. 19]

Also, the sequence s′ is expressed as follows.s′

s  [Math. 20]The sequence s′ is a subsequence of the sequence s. Let theircorrespondence relation be expressed using φ, for the following.OP_([*, o, l]) ^((t))εs, OP_([*, o′, l′]) ^((t′))εs′  [Math. 21]

In this case, the correspondence relation is expressed as below.OP _([*, o, l]) ^((t))=φ(OP _([*, o′, l′]) ^((t′)))  [Math. 22]

<Assumption 1> A transformation operator is generated according to theshortest edit distance between g^((t)) and g^((t+1)). Suppose that thefollowing expressions in Math. 23 and Math. 24 are included in onetransformation operator representation.OP_([vi, o) ₁ _(, l]) ^((t) ¹ ⁾  [Math. 23]OP_([vd, o) ₂ _(, l]) ^((t) ² ⁾  [Math. 24]Here, note that there is no value combination expressed as t1=t2 ando1=o2, whereby a vertex is inserted and then immediately deleted.

Transformation Operator Sequence Representations=

OP _([*, o) ₁ _(, l) ₁ _(]) ⁽⁰⁾ . . . OP _([*, o) _(k) _(, l) _(k) _(])^((n−1))

  [Math. 25]When the above equation is given, a union graph G of s expressed asG=(V, E) is defined as follows.V(G)={o|OP _([q, o, l]) ^((t)) εs, qε{vi, vd, vr}}E(G)={o|OP _([q, o, l]) ^((t)) εs, qε{ei, ed, er}}  [Math. 26]Also, for DB={(tid_(i), d_(i))|d_(i)=<g_(i) ⁽¹⁾ . . . g_(i) ^((ti))>},the support of the pattern s in the transformation operator sequencerepresentation is expressed as follows.σ(s)=|{tid _(i)|(tid _(i) , d _(i))εDB, s

seq(d _(i))}|/|DB|  [Math. 27]The union graph G is generated by the sequence-for-union-graphgeneration unit 14.

<Pattern Enumeration Problem 2 (Extended Problem)>

Suppose that a collection of graph sequences expressed as DB={(tid_(i),d_(i))|d_(i)=<g_(i) ⁽¹⁾ . . . g_(i) ^((ti))>} and σ′ are given asinputs. In this case, the problem is to enumerate each frequent patternexpressed below in the transformation operator sequence representation,where the union graph is connected.

OP_([*, o) ₁ _(, l) ₁ _(]) ⁽⁰⁾ . . . OP_([*, o) _(k) _(, l) _(k) _(])^((n−1))

  [Math. 28]This processing is executed by the extraction unit 18.

<Theorem 1> The support has the anti-monotonicity property with respectto a sequence length of the pattern.

<Theorem 2> Suppose that a collection of graph data sequences expressedas DB={(tid_(i), d_(i))|d_(i)=<g_(i) ⁽¹⁾ . . . g_(i) ^((ti))>} and σ′are given as inputs. Here, let sets of all the patterns outputted inPattern Enumeration Problems 1 and 2 be P₁ and P₂, respectively. In thiscase, the following expression is derived.P₁ ⊂P₂  [Math. 29]

As described above, the object in the present invention is to mine apattern which is readable and has fewer constraints (namely, a versatilepattern). According to the definition of the union graph in thetransformation operator sequence representation, when the union graph inthe transformation operator sequence representation is connected, it canbe said that the two vertices v_(i) and v_(j) in the transformationoperator sequence representation are associated with each other. Hence,the patterns outputted in Pattern Enumeration Problem 2 are readable.Although the proof is omitted due to space limitation, it is consideredthat, according to Theorem 2, the patterns outputted in PatternEnumeration Problem 1 can be outputted by imposing (i.e., increasing)constraints on the patterns outputted in Pattern Enumeration Problem 2.Hereafter, a discussion is made on Pattern Enumeration Problem 2.

When the operations OPs were defined above, the order in which theoperations are applied was not discussed in detail. In the following,commutative properties of the operators are described. Similarly, theproperties including relabeling can be defined, although omitted heredue to space limitation. The following explanation is given based on theassumption that t<t′<t″. It should be noted that the order of operatorsis changed by the order changing unit 16.

<Vertex Insertion→Vertex Insertion>

Consideration is given to the case where vertices with the unique IDs iand j are to be inserted. Suppose that the vertex with the unique ID iis first inserted and then the vertex with the unique ID j is insertedinto the graph g (t), so that a graph g^((t″)) is generated. Here, ifthe order of insertions is changed as follows, an isomorphic graphg^((t″)) is generated.

$\begin{matrix}{g^{(t^{''})} = {\left. {{OP}_{\lbrack{{vi},j,l_{2}}\rbrack}^{(t^{\prime})}{OP}_{\lbrack{{vi},i,l_{1}}\rbrack}^{(t)}g^{(t)}}\Rightarrow g^{(t^{''})} \right. = {{OP}_{\lbrack{{vi},i,l_{1}}\rbrack}^{(t)}{OP}_{\lbrack{{vi},j,l_{2}}\rbrack}^{(t^{\prime})}g^{(t)}}}} & \left\lbrack {{Math}.\mspace{14mu} 30} \right\rbrack\end{matrix}$

<Vertex Insertion→Vertex Deletion>

Consideration is given to the case where the vertex with the unique ID iis first inserted and then the vertex with the unique ID j is deleted.When i≠j and the graph g^((t″)) is generated according to thisoperation, the isomorphic graph g^((t″)) is generated even if the orderof insertion is changed as follows. On the other hand, when i=j, theorder cannot be changed because the inserted vertex is to be deleted.

$\begin{matrix}{{{{if}\mspace{14mu} i} \neq {j\begin{pmatrix}{{{that}\mspace{14mu}{is}},{{if}\mspace{14mu}{the}\mspace{14mu}{inserted}\mspace{14mu}{vertex}\mspace{14mu}{is}\mspace{14mu}{not}\mspace{14mu}{to}\mspace{14mu}{be}}} \\{\mspace{14mu}{deleted}}\end{pmatrix}}}{g^{(t^{''})} = {\left. {{OP}_{\lbrack{{vd},j,l_{2}}\rbrack}^{(t^{\prime})}{OP}_{\lbrack{{vi},i,l_{1}}\rbrack}^{(t)}g^{(t)}}\Rightarrow g^{(t^{''})} \right. = {{OP}_{\lbrack{{vi},i,l_{1}}\rbrack}^{(t)}{OP}_{\lbrack{{vd},j,l_{2}}\rbrack}^{(t^{\prime})}g^{(t)}}}}{{else}\mspace{14mu}{inapplicable}}} & \left\lbrack {{Math}.\mspace{14mu} 31} \right\rbrack\end{matrix}$

<Vertex Deletion→Vertex Insertion>

The vertex with the unique ID i is first deleted and then the vertexwith the unique ID j is inserted. Since the vertex to be deleted isselected from among the vertices with the unique IDs other than i, theorder can be changed.

$\begin{matrix}{g^{(t^{''})} = {\left. {{OP}_{\lbrack{{vi},j,l_{2}}\rbrack}^{(t^{\prime})}{OP}_{\lbrack{{vd},i,l_{1}}\rbrack}^{(t)}g^{(t)}}\Rightarrow g^{(t^{''})} \right. = {{OP}_{\lbrack{{vd},i,l_{1}}\rbrack}^{(t)}{OP}_{\lbrack{{vi},j,l_{2}}\rbrack}^{(t^{\prime})}g^{(t)}}}} & \left\lbrack {{Math}.\mspace{14mu} 32} \right\rbrack\end{matrix}$

An edge insertion is expressed as follows.OP_([ei, (i, j), l]′) ^((t))  [Math. 33]An edge deletion is expressed as follows.OP_([ed, (i, j), l]) ^((t))  [Math. 34]In the present embodiment, the edge change is expressed as follows.OP_([e, (i, j), l]) ^((t))  [Math. 35]

$\begin{matrix}{{{{if}\mspace{14mu} i} \neq {j\mspace{14mu}{and}\mspace{14mu} i} \neq {k\begin{pmatrix}{{{that}\mspace{14mu}{is}},{{if}\mspace{20mu}{an}\mspace{14mu}{edge}\mspace{14mu}{of}\mspace{14mu}{the}\mspace{14mu}{inserted}\mspace{14mu}{vertex}}} \\{\mspace{31mu}{{is}\mspace{14mu}{not}\mspace{20mu}{to}\mspace{14mu}{be}\mspace{14mu}{inserted}\mspace{14mu}{or}\mspace{14mu}{deleted}}}\end{pmatrix}}}{g^{(t^{''})} = {\left. {{OP}_{\lbrack{e,{({j,k})},l_{2}}\rbrack}^{(t^{\prime})}{OP}_{\lbrack{{vi},i,l_{1}}\rbrack}^{(t)}g^{(t)}}\Rightarrow g^{(t^{''})} \right. = {{OP}_{\lbrack{{vi},i,l_{1}}\rbrack}^{(t)}{OP}_{\lbrack{e,{({j,k})},l_{2}}\rbrack}^{(t^{\prime})}g^{(t)}}}}{{else}\mspace{14mu}{inapplicable}}} & \left\lbrack {{Math}.\mspace{14mu} 36} \right\rbrack\end{matrix}$

<Edge Change→Vertex Insertion>

$\begin{matrix}{g^{(t^{''})} = {\left. {{OP}_{\lbrack{{vi},k,l_{2}}\rbrack}^{(t^{\prime})}{OP}_{\lbrack{e,{({i,j})},l_{1}}\rbrack}^{(t)}g^{(t)}}\Rightarrow g^{(t^{''})} \right. = {{OP}_{\lbrack{e,{({i,j})},l_{1}}\rbrack}^{(t)}{OP}_{\lbrack{{vi},k,l_{2}}\rbrack}^{(t^{\prime})}g^{(t)}}}} & \left\lbrack {{Math}.\mspace{14mu} 37} \right\rbrack\end{matrix}$

<Vertex Deletion→Vertex Deletion>

$\begin{matrix}{g^{(t^{''})} = {\left. {{OP}_{\lbrack{{vd},j,l_{2}}\rbrack}^{(t^{\prime})}{OP}_{\lbrack{{vd},i,l_{1}}\rbrack}^{(t)}g^{(t)}}\Rightarrow g^{(t^{''})} \right. = {{OP}_{\lbrack{{vd},i,l_{1}}\rbrack}^{(t)}{OP}_{\lbrack{{vd},j,l_{2}}\rbrack}^{(t^{\prime})}g^{(t)}}}} & \left\lbrack {{Math}.\mspace{14mu} 38} \right\rbrack\end{matrix}$

<Vertex Deletion→Edge Change>

$\begin{matrix}{g^{(t^{''})} = {\left. {{OP}_{\lbrack{e,{({j,k})},l_{2}}\rbrack}^{(t^{\prime})}{OP}_{\lbrack{{vd},i,l_{1}}\rbrack}^{(t)}g^{(t)}}\Rightarrow g^{(t^{''})} \right. = {{OP}_{\lbrack{{vd},i,l_{1}}\rbrack}^{(t)}{OP}_{\lbrack{e,{({j,k})},l_{2}}\rbrack}^{(t^{\prime})}g^{(t)}}}} & \left\lbrack {{Math}.\mspace{14mu} 39} \right\rbrack\end{matrix}$

<Edge Change→Vertex Deletion>

$\begin{matrix}{{{{if}\mspace{14mu} i} \neq {j\mspace{14mu}{and}\mspace{14mu} i} \neq {k\begin{pmatrix}{{{that}\mspace{14mu}{is}},{{if}\mspace{20mu}{an}\mspace{14mu}{edge}\mspace{14mu}{of}\mspace{14mu}{the}\mspace{14mu}{inserted}\mspace{14mu}{vertex}}} \\{\mspace{31mu}{{is}\mspace{14mu}{not}\mspace{20mu}{to}\mspace{14mu}{be}\mspace{14mu}{changed}}}\end{pmatrix}}}{g^{(t^{''})} = {\left. {{OP}_{\lbrack{{vd},k,l_{2}}\rbrack}^{(t^{\prime})}{OP}_{\lbrack{e,{({i,j})},l_{1}}\rbrack}^{(t)}g^{(t)}}\Rightarrow g^{(t^{''})} \right. = {{OP}_{\lbrack{e,{({i,j})},l_{1}}\rbrack}^{(t)}{OP}_{\lbrack{{vd},k,l_{2}}\rbrack}^{(t^{\prime})}g^{(t)}}}}{{else}\mspace{14mu}{inapplicable}}} & \left\lbrack {{Math}.\mspace{14mu} 40} \right\rbrack\end{matrix}$

<Edge Change→Edge Change>

$\begin{matrix}{g^{(t^{''})} = {\left. {{OP}_{\lbrack{e,{({k,h})},l_{2}}\rbrack}^{(t^{\prime})}{OP}_{\lbrack{e,{({i,j})},l_{1}}\rbrack}^{(t)}g^{(t)}}\Rightarrow g^{(t^{''})} \right. = {{OP}_{\lbrack{e,{({i,j})},l_{1}}\rbrack}^{(t)}{OP}_{\lbrack{e,{({k,h})},l_{2}}\rbrack}^{(t^{\prime})}g^{(t)}}}} & \left\lbrack {{Math}.\mspace{14mu} 41} \right\rbrack\end{matrix}$

<3. Pattern Enumeration Algorithm>

As described in the preceding section, the changes in the graphs can beexpressed using the operators. The commutative properties of theseoperators have been described as well. Before explaining the patternenumeration algorithm in detail, the conception is first described usingspecific examples. It should be noted that the pattern enumerationprocessing is performed by the subsequence candidate generation unit 20and the appearance frequency calculation unit 22 included in theextraction unit 18. One of the output patterns is shown in FIG. 6, andthis pattern is represented as follows.

$\begin{matrix}\left\lbrack {{Math}.\mspace{14mu} 42} \right\rbrack & \; \\{g^{(4)} = {{{OP}_{\lbrack{{ei},{({2,3})}, -}\rbrack}^{(3)}{OP}_{\lbrack{{ei},{({2,4})}, -}\rbrack}^{(2)}{OP}_{\lbrack{{vd},1,{red}}\rbrack}^{(2)}{OP}_{\lbrack{{ed},{({1,2})}, -}\rbrack}^{(2)}{OP}_{\lbrack{{ed},{({2,3})}, -}\rbrack}^{(1)}{OP}_{\lbrack{{ei},{({3,4})}, -}\rbrack}^{(1)}{OP}_{\lbrack{{vi},4,{red}}\rbrack}^{(1)}{OP}_{\lbrack{{ei},{({2,3})}, -}\rbrack}^{(0)}{OP}_{\lbrack{{ei},{({1,2})}, -}\rbrack}^{(0)}{OP}_{\lbrack{{vi},3,{blue}}\rbrack}^{(0)}{OP}_{\lbrack{{vi},2,{blue}}\rbrack}^{(0)}{OP}_{\lbrack{{vi},1,{red}}\rbrack}^{(0)}}\bot}} & (1)\end{matrix}$Table 2 shows the operators corresponding to the applications.Consideration is given to the case where the order of these operators ischanged within a commutative limit. Table 3 shows one example ofchanging the order, and this order change is represented in FIG. 7. Ascan be seen from FIG. 7, the graph is gradually expanded by making theinsertions of one vertex and edges connecting to this vertex as one set.The original changing graph sequential pattern (1) can be obtained byrearranging the operators in the order of application.

TABLE 2 Transformation Operator Representation of FIG. 1 g₁ ⁽⁰⁾ =OP_([vi, 1, red]) ⁽⁰⁾bot g₂ ⁽⁰⁾ = OP_([vi, 2, blue]) ⁽⁰⁾g₁ ⁽⁰⁾ g₃ ⁽⁰⁾ =OP_([vi, 3, blue]) ⁽⁰⁾g₂ ⁽⁰⁾ g₄ ⁽⁰⁾ = OP_([ei,) _((1, 2), —]) ⁽⁰⁾g₃ ⁽⁰⁾g⁽¹⁾ = OP_([ei,) _((2, 3) ,—]) ⁽⁰⁾g₄ ⁽⁰⁾ 2 g₁ ⁽¹⁾ = OP_([vi, 4, red])⁽¹⁾g⁽¹⁾ g₂ ⁽¹⁾ = OP_([ei,) _((3, 4), —]) ⁽¹⁾g₁ ⁽¹⁾ g⁽²⁾ = OP_([ed,)_((2, 3), —]) ⁽¹⁾g₂ ⁽¹⁾ 3 g₁ ⁽²⁾ = OP_([ed,) _((1, 2), —]) ⁽²⁾g⁽²⁾ g₂⁽²⁾ = OP_([vd, 1, red]) ⁽²⁾g₁ ⁽²⁾ g⁽³⁾ = OP_([ei,) _((2, 4), —]) ⁽²⁾g₂⁽²⁾ 4 g⁽⁴⁾ = OP_([ei,) _((2, 3), —]) ⁽³⁾g⁽³⁾

On the other hand, Table 4 and FIG. 8 show a method of expanding thegraph by making the insertion of one edge or the insertions of one edgeand one vertex as one set. Let attention be focused only on the growthof the topology alone where the application order t and the like areignored. In this case, the former is a pattern growth approach accordingto the AcGM algorithm (see Non-Patent Reference 3, for example)(although both the AcGM and FSG algorithms are based on the “candidategenerate and test” approach instead of the pattern growth approach, theterm “pattern growth” is used here for both of them). The latter is apattern growth approach according to the gSpan algorithm (see Non-PatentReference 10, for example). Moreover, according to a different order ofoperators, it is possible for the pattern to grow from a path, then to afree tree, and then to a graph in this order, as in the case of theGaston algorithm (see Non-Patent Reference 7, for example). As describedthus far, the proposed method is highly versatile whereby various kindsof existing frequent graph mining methods can be integrated through thechange in the order of operators.

A scaffold sequence s′ of s in the transformation operator sequencerepresentation is defined.

Suppose that t₁<t₂ and o₁=o₂ in the following expression.OP_([*, o) ₁ _(, l) ₁ _(]) ^((t) ¹ ⁾, OP_([*, o) ₂ _(, l) ₂ _(]) ^((t) ²⁾εs  [Math. 43]In this case, s′ is defined as a subsequence of s.

Here, the sequence s is configured by the following.OP_([*, o) ₁ _(, l) ₁ _(]) ^((t) ¹ ⁾  [Math. 44]The operators from g₁ to g₈ in Table 3 and the operators from g₁ to g₈in Table 4 form the respective scaffold sequences.

TABLE 3 Change in Transformation Operators of Table 2 1 g₁ =OP_([vi, 4, red]) ⁽¹⁾⊥ 2 g₂ = OP_([vi, 2, blue]) ⁽⁰⁾g₁ g₃ = OP_([ei,)_((2, 4), —]) ⁽²⁾g₂ 3 g₄ = OP_([vi, 1, red]) ⁽⁰⁾g₃ g₅ = OP_([ei,)_((1,2), —]) ⁽⁰⁾g₄ 4 g₆ = OP_([vi, 3, blue]) ⁽⁰⁾g₅ g₇ = OP_([ei,)_((2, 3), —]) ⁽⁰⁾g₆ g₈ = OP_([ei,) _((3, 4), —]) ⁽¹⁾g₇ 5 g₉ = OP_([ed,)_((2, 3), —]) ⁽¹⁾g₈ 6 g₁₀ = OP_([ed,) _((1, 2), —]) ⁽²⁾g₉ 7 g₁₁ =OP_([vd, 1, red]) ⁽²⁾g₁₀ 8 g₁₂ = g⁽⁴⁾ = OP_([ei,) _((2, 3), —]) ⁽³⁾g₁₁

<Theorem 3> When a correspondence relation between the pattern s in thetransformation operator sequence representation and its scaffoldsequence s′ is φ, the following is satisfied.

$\begin{matrix}{\left\{ {{{OP}_{\lbrack{*{,o,l}}\rbrack}^{(t)} \in s},{{\nexists{{OP}_{\lbrack{*{,o^{\prime},l^{\prime}}}\rbrack}^{(t^{\prime})} \in {s^{\prime}\mspace{14mu}{s.t}\mspace{14mu}{OP}_{\lbrack{*{,o,l}}\rbrack}^{(t)}}}} = {\phi\left( {OP}_{\lbrack{*{,o^{\prime},l^{\prime}}}\rbrack}^{(t^{\prime})} \right)}}} \right\} \subseteq \left\{ {\phi\left( o^{\prime} \right)} \middle| {{OP}_{\lbrack{*{,o^{\prime},l^{\prime}}}\rbrack}^{(t^{\prime})} \in s^{\prime}} \right\}} & \left\lbrack {{Math}.\mspace{14mu} 45} \right\rbrack\end{matrix}$

<Theorem 4> A union graph of the pattern s in the transformationoperator sequence representation is isomorphic to a union graph obtainedfrom the scaffold sequence of the pattern s.

Accordingly, as one of the methods to obtain the frequent pattern sexpressed in the transformation operator sequence representation, thereis a method whereby the scaffold sequence s′ of the pattern s isgenerated and then a transformation operator is inserted into s′ forexpansion without changing the union graph of s′. In fact, it can beunderstood that the operators subsequent to g₉ in Table 3 and theoperators subsequent to g₉ in Table 4 expand the respective patternswithout changing the union graphs of the scaffold sequences. Thus, analgorithm including the following two steps can be considered:

1. first enumerating all scaffold sequences of all patterns to beextracted; and

2. sequentially expanding the pattern by inserting an operator that isnot included in the scaffold sequence, without changing the union graphof the scaffold sequence.

In the above step 1, an expand operation in the scaffold sequence s isdescribed as “expand (s)”.

TABLE 4 Change in Transformation Operators of Table 2 (2) 1 g₁ =OP_([vi, 3, blue]) ⁽⁰⁾⊥ 2 g₂ = OP_([vi, 2, blue]) ⁽⁰⁾g₁ g₃ =OP_([ei, (2, 3), —]) ⁽⁰⁾g₂ 3 g₄ = OP_([vi, 1, red]) ⁽⁰⁾g₃ g₅ = OP_([ei,)_((1,2), —]) ⁽⁰⁾g₄ 4 g₆ = OP_([vi, 4, red]) ⁽¹⁾g₅ g₇ = OP_([ei,)_((2, 4), —]) ⁽²⁾g₆ 5 g₈ = OP_([ei,) _((3, 4), —]) ⁽¹⁾g₇ 6 g₉ =OP_([ed,) _((2, 3), —]) ⁽¹⁾g₈ 7 g₁₀ = OP_([ed,) _((1, 2), —]) ⁽²⁾g₉ 8g₁₁ = OP_([vd, 1, red]) ⁽²⁾g₁₀ 9 g₁₂ = g⁽⁴⁾ = OP_([ei,) _((2, 3), —])⁽³⁾g₁₁

<3. 1 Expansion of Scaffold Sequence>

FIG. 9 shows a part of a search tree in which a search is made for ascaffold sequence having two or fewer vertices in the union graph.Although triangles in the diagram indicate search spaces, detaileddescriptions are omitted due to space limitation. The search for thescaffold sequence is made by the subsequence candidate generation unit20 and the appearance frequency calculation unit 22. Suppose that thereare two kinds of vertex labels A and B and one kind of edge label—, andthat relabeling is not performed. The subsequence candidate generationunit 20 generates a pattern candidate with one vertex, and theappearance frequency calculation unit 22 calculates the number ofappearances of the scaffold pattern. Thus, the search is first made forthe pattern with one vertex. Here, as a child node of a root node in thesearch tree, a node is generated for each of all the scaffold patternsexpressed as follows, that can exist with one vertex.OP_([vi, 1, A]) ⁽⁰⁾, OP_([vd, 1, A]) ⁽⁰⁾, OP_([vi, 1, B]) ⁽⁰⁾,OP_([vd, 1, B]) ⁽⁰⁾  [Math. 46]Note that the unique IDs of the vertices in the patterns are representedby integer values starting from 1.OP_([vi, 1, A]) ⁽⁰⁾  [Math. 47]Next, the above pattern is expanded, so that its child node isgenerated. The pattern is expanded in such a manner that the union graphof the scaffold pattern is connected, instead of expanding the patternso as to increase the application order t of the transformationoperators. When the expansion method is based on the AcGM algorithm, avertex and an edge associated with the vertex are inserted. When theexpansion method is based on one of the FSG, gSpan, and Gastonalgorithms, the pattern is expanded with an edge and a vertex associatedwith the edge. Here, the pattern is not expanded using a transformationoperator that is already included in the scaffold sequence and has o.

Attention needs to be paid to the following patterns.[Math. 48]OP_([ei, (1, 2), −]) ⁽⁰⁾OP_([vi, 2, A]) ⁽¹⁾OP_([vi, 1, A]) ⁽⁰⁾⊥  (2)OP_([ei, (1, 2), −]) ⁽²⁾OP_([vi, 2, B]) ⁽⁰⁾OP_([vi, 1, A]) ⁽¹⁾⊥  (3)In the pattern (2), a vertex, where t=0, with the label A and the uniqueID 1 is inserted and also an edge is inserted between a pair of vertices(1, 2). Then, in the subsequent pattern, a vertex, where t=1, with thelabel A and the unique ID 2 is inserted. From this information alone,since the edge (1, 2) is inserted before the vertex with the unique ID 2is inserted, it seems impossible to insert the edge.OP_([vi, 2, A]) ⁽²⁾OP_([vi, 2, A]) ⁽¹⁾OP_([ed, (1, 2), −])⁽¹⁾OP_([ei, (1, 2), −]) ⁽⁰⁾OP_([vi, 2, A]) ⁽⁰⁾OP_([vi, 1, A])⁽⁰⁾⊥  [Math. 49]However, when the above pattern frequently appears, the pattern (2) asthe subsequence also frequently appears because of the anti-monotonicityof the support. For this reason, the pattern (2) needs to be enumeratedas well.

The pattern (3) is generated by expanding the following.OP_([vi, 1, A]) ⁽⁰⁾  [Math. 50]Here, the order in which the vertex with the unique ID 1 is inserted ischanged.OP_([*, o, l]) ^((t))  [Math. 51]Here, t in the above operator of the pattern shows information of theorder in which two operators are applied. Thus, attention needs to bepaid to the fact that the order in which the operators in the patternare applied is changed in this way as the pattern is expanded.

In the search tree, it is not always true that only one isomorphicpattern appears. For example, the following two sequences areisomorphic.OP_([ei, (1, 2), −]) ⁽⁰⁾OP_([vi, 2, B]) ⁽⁰⁾OP_([vi, 1, A]) ⁽⁰⁾⊥OP_([ei, (1, 2), −]) ⁽⁰⁾OP_([vi, 2, A]) ⁽⁰⁾OP_([vi, 1, B]) ⁽⁰⁾⊥  [Math.52]It is inefficient if the isomorphic patterns in differentrepresentations are repeatedly generated. In such a case, when a graphcode that is generated from the union graph of the scaffold pattern andthe unique IDs of the vertices in the union graph is a canonical code,the present pattern is left in the search space. The graph code dependson the algorithm, such as AcGM, gSpan, FSG, and Gaston, that is employedfor expanding the scaffold pattern.

<3. 2 Pattern Expansion from Projection Data>

The scaffold sequence s is generated according to the method describedin the preceding section. Then, as described in the present section, thesequence s is expanded by inserting an operator that is not included inthe scaffold sequence, without changing the union graph of s. Thepattern scaffold ends at g₈ in Table 3 and also at g₈ in Table 4. In thepresent section, the processing performed for g₉ and the subsequentoperators is explained.

Suppose that a correspondence relation between a scaffold sequence s anddata (tid_(i), d_(i)) including this sequence s is expressed as φ. Inthis case, a projection function “project” is defined as follows.{(tid _(i) , d′ _(i))}=project((tid _(i) , d _(i)), s)  [Math. 53]Here, d′_(i) satisfies the following.

-   -   d′_(i) is a subsequence of seq(d_(i)).    -   o′ of OP_([*, o′, l′]) ^((t′))εd′_(i) is included in        {o|OP _([*, o, l]) ^((t)) εd _(i) , OP _([*, o) _(s) _(, l) _(s)        _(]) ^((t) ^(s) ⁾ εs s.t. OP _([*, o, l]) ^((t))=φ(OP _([*, o)        _(s) _(, l) _(s) _(]) ^((t) ^(s) ⁾)}.  [Math. 54]    -   When OP_([*, o′, l′]) ^((t′))εd′_(i) exists in OP_([*, o, l])        ^((t))εd_(i), OP_([*, o) _(s) _(, l) _(s) _(]) ^((t) ^(s) ⁾εs        where o=o′ and OP_([*, o, l]) ^((t))=φ(OP_([*, o) _(s) _(, l)        _(s) _(]) ^((t) ^(s) ⁾), t≦t′.    -   d′_(i) has a maximal sequence length in order to satisfy the        above.

EXAMPLE 3

Suppose that a scaffold sequence s and sequence data d_(i) are expressedby the following equations respectively in the transformation operatorsequence representation.

$\begin{matrix}{{s = \left\langle {{OP}_{\lbrack{{vi},1,A}\rbrack}^{(1)}{OP}_{\lbrack{{vi},2,B}\rbrack}^{(2)}{OP}_{\lbrack{{ei},{({1,2})}, -}\rbrack}^{(2)}{OP}_{\lbrack{{vi},3,C}\rbrack}^{(3)}{OP}_{\lbrack{{ei},{({2,3})}, -}\rbrack}^{(3)}} \right\rangle}{{{seq}\left( d_{i} \right)} = \left\langle {{OP}_{\lbrack{{vi},1,D}\rbrack}^{(1)}{OP}_{\lbrack{{vi},2,A}\rbrack}^{(1)}{OP}_{\lbrack{{ei},{({1,2})}, -}\rbrack}^{(1)}{OP}_{\lbrack{{vi},3,B}\rbrack}^{(2)}{OP}_{\lbrack{{ei},{({2,3})}, -}\rbrack}^{(2)}{OP}_{\lbrack{{ed},{({2,3})}, -}\rbrack}^{(3)}{OP}_{\lbrack{{ei},{({1,3})}, -}\rbrack}^{(3)}{OP}_{\lbrack{{ed},{({1,2})}, -}\rbrack}^{(4)}{OP}_{\lbrack{{vd},2,A}\rbrack}^{(4)}{OP}_{\lbrack{{vi},4,C}\rbrack}^{(4)}{OP}_{\lbrack{{ei},{({3,4})}, -}\rbrack}^{(4)}{OP}_{\lbrack{{vi},2,B}\rbrack}^{(5)}{OP}_{\lbrack{{ed},{({3,4})}, -}\rbrack}^{(5)}{OP}_{\lbrack{{ei},{({1,4})}, -}\rbrack}^{(5)}} \right\rangle}} & \left\lbrack {{Math}.\mspace{14mu} 55} \right\rbrack\end{matrix}$

Here, project ((tid_(i), d_(i)), s) is expressed as follows.

$\begin{matrix}\left\lbrack {{Math}.\mspace{14mu} 56} \right\rbrack & \; \\{{{project}\left( {\left( {{tid}_{i},d_{i}} \right),s} \right)} = \left\{ \left( {{tid}_{i},\left\langle {{OP}_{\lbrack{{vi},2,A}\rbrack}^{(1)}{OP}_{\lbrack{{vi},3,B}\rbrack}^{(2)}{OP}_{\lbrack{{ei},{({2,3})}, -}\rbrack}^{(2)}{OP}_{\lbrack{{ed},{({2,3})}, -}\rbrack}^{(3)}{OP}_{\lbrack{{vd},2,A}\rbrack}^{(4)}{OP}_{\lbrack{{vi},4,C}\rbrack}^{(4)}{OP}_{\lbrack{{ei},{({3,4})}, -}\rbrack}^{(4)}{OP}_{\lbrack{{vi},2,B}\rbrack}^{(5)}{OP}_{\lbrack{{ed},{({3,4})}, -}\rbrack}^{(5)}} \right\rangle} \right) \right\}} & (4)\end{matrix}$

The sequence (4) is expressed as follows when the operators having thesame application order t are parenthesized and t is thus removed.

OP_([vi, 2, A])(OP_([vi, 3, B])OP_([ei, (2, 3), −]))OP_([ed, (2, 3), −])(OP_([vd, 2, A])OP_([vi, 4, C])OP_([ei, (3, 4), −]))(OP_([vi, 2, B])OP_([ed, (3, 4), −]))

  [Math. 57]

Accordingly, the sequence can be assumed to be in the sequencerepresentation of sequential pattern mining in which an operator istreated as an item. The following is generated from the input databaseand the scaffold pattern s.DB′(s)={(tid _(i) , d′ _(i))|(tid _(i) , d _(i))εDB, (tid _(i) , d′_(i))εproject((tid _(i) , d _(i)), s)}  [Math. 58]With the above being an input for the sequential pattern mining, thepattern can be sequentially expanded without changing the union graph ofthe scaffold sequence s.

<3. 3 Pseudo-Codes>

FIG. 10 shows pseudo-codes of the proposed method implemented by thefrequent changing pattern extraction device 100. As inputs, a databaseDB which is a collection of sequence data and a support threshold σ′ aregiven. In Line 7, the scaffold sequence is expanded. In Line 9, it isverified whether or not the scaffold sequence s is canonical. Thiscorresponds to the processing of “if s=min(s)” of the pseudo-codeaccording to the gSpan algorithm (see Non-Patent Reference 10, forexample). In Line 15, the projection data is generated using theenumerated scaffold sequences. Then, according to the sequential patternmining method, all the patterns each having a union graph isomorphic tothe union graph of the scaffold sequence are enumerated. FIG. 10 shows amethod of enumerating patterns according to the breadth-first searchalgorithm. Similarly, it is possible to design a method of enumeratingpatterns according to the depth-first search algorithm.

<4. Evaluation Experiment and Consideration>

An evaluation experiment was carried out for the method described up tothe preceding section. The method was implemented in C++, and a personalcomputer (PC) with a 1.66-GHz Core Duo CPU and 1.5-GB memory was used.For the sequential pattern mining, the PrefixSpan algorithm (seeNon-Patent Reference 8, for example) was used. Table 5 shows a summaryof meanings and default values of parameters in artificial data used inthe present experiment. Firstly, N number of labeled graphs each havingan average of |V_(avg)| vertices are generated. The vertex labels aredetermined according to equal probabilities from L_(v) number of labels,and the existence probability of an edge between two vertices isdetermined according to p_(e). This is a union graph of a basic pattern.Each basic pattern starts from the following.⊥  [Math. 59]Until the union graph of the operator sequence becomes isomorphic to thepreviously-generated union graph, the operator sequence of the basicpattern is generated by inserting a transformation operator one at atime. The operator is only for inserting or deleting a vertex or edge. Atarget vertex or edge is randomly selected, and whether to insert ordelete the target is determined according to the probability p_(i). Inthis way, |DB| number of graph sequences are generated.

Then, one basic pattern is written over each described below.(tid_(i), d_(i))εDB  [Math. 60]

TABLE 5 Default Values for Generating Experiment Data Parameter DefaultValue Insert selection probability p_(i) = 80% of data Insert selectionprobability p′_(i) =50% of basic pattern Average number of unique|V_(avg)| = 5 IDs in basic pattern Average number of unique |V′_(avg)| =7 IDs in data Number of vertex labels |L_(v)| = 5 Number of edge labels|L_(e)| = 1 Number of basic patterns N = 10 Number of data sets in DB|DB| = 10,000 Edge existence probability P_(e) = 20% Support thresholdσ′ = 10%

Some of the results are shown in FIGS. 11 to 13. FIG. 11 showsvariations in the calculation time with respect to variations in |DB|.It can be seen that the calculation time is proportional to an increasein the number of data pieces. FIG. 12 shows variations in thecalculation time with respect to variations in p′_(i). Note that thehorizontal axis denotes the average number of operators in thesequences. As p′_(i) decreases, the average number of operatorsincreases and the calculation time increases in an exponential manner.FIG. 13 shows variations in the calculation time with respect tovariations in σ′. As σ′ decreases, the calculation time increases.

As described thus far, the present invention proposes a method ofenumerating readable frequent changing graph sequential patterns thatare included in labeled graph sequences. Since the graph transformationoperations are defined and the order in which the operations are appliedis changed, the patterns can be enumerated with efficiency. Moreover,the evaluation experiment was carried out for the proposed method usingthe artificial data, and the variations in calculation time differentdepending on the data characteristics were shown.

The present invention allows the graph changes to be expressed using theoperators. Thus, the changes in graphs (i.e., network structure) can berepresented by an operator sequence. Based on the anti-monotonicity usedin the Apriori algorithm, a frequent operator subsequence can beextracted. Since the operator sequence represents the changes in graphs,a frequent pattern of change in the graphs can be extracted.

Moreover, a graph which is not connected to a union graph is considereddifficult for people to interpret. On account of this, a graph which isnot connected to a union graph is removed, so that only the operatorsequences included in the union graph become the targets in theprocessing. As a result, only operator subsequences (the patterns ofchange in the graphs) which are useful to people can be accordinglyextracted. Furthermore, the number of operator sequences to be evaluatedby the extraction unit can be reduced, and therefore the processing canbe performed at high speed.

Also, the order changing unit 16 changes the order in which theoperators are applied, thereby making it easier to apply theanti-monotonicity used in the Apriori algorithm.

The above embodiment describes a case where the changing graph sequencestorage unit 10 stores a plurality of graph sequences. However, notethat the changing graph sequence storage unit 10 may store only onegraph sequence. In such a case, the frequent changing pattern extractiondevice 100 extracts an operator subsequence which appears at least apredetermined number of times in one operator sequence converted fromone graph sequence.

Applications of such a frequent changing pattern extraction deviceinclude analyzing e-mail messages. For example, a graph g^((t)) isgenerated, in which a vertex corresponds to a person (namely, an e-mailaddress) and an edge corresponds to a connection between e-mailaddresses between which e-mail messages have been exchanged. Byanalyzing the e-mail messages with such a graph being a start point, itis possible to extract a person who is going to be a hub in a community.

The embodiment disclosed thus far only describes an example in allrespects and is not intended to limit the scope of the presentinvention. It is intended that the scope of the present invention not belimited by the described embodiment, but be defined by the claims setforth below. Meanings equivalent to the description of the claims andall modifications are intended for inclusion within the scope of thefollowing claims.

INDUSTRIAL APPLICABILITY

The present invention can be applied to a frequent changing patternextraction device which extracts a pattern of change frequentlyappearing in a network structure that changes from moment to moment. Inparticular, the present invention can be applied to, for example: adrug-discovery support device which supports drug discovery byextracting a pattern of change frequently appearing in changes ingenetic structure; and an executive-candidate discovery support devicewhich supports discovery of executive candidates by extracting a commonpattern of change in human relations occurring to persons who are goingto become hubs in a human relation network.

1. A frequent changing pattern extraction device comprising: aprocessor; a conversion unit configured to convert, using saidprocessor, a graph sequence into an operator sequence by expressingchanges, from a first graph included in the graph sequence to a secondgraph which is temporally adjacent to the first graph, using operatorsindicating operations necessary to transform the first graph into thesecond graph, the graph sequence including a plurality of graphs thatshow temporal changes in the graphs, and each of the graphs including avertex corresponding to a data piece and an edge corresponding to a linkbetween data pieces; and an extraction unit configured to extract anoperator subsequence that appears at least a predetermined number oftimes in the operator sequence, based on anti-monotonicity used in theApriori algorithm, wherein there are a plurality of graph sequences,wherein said conversion unit is configured to convert each of theplurality of graph sequences into an operator sequence by expressingchanges, from a first graph included in the graph sequence to a secondgraph which is temporally adjacent to the first graph, using operatorsindicating operations necessary to transform the first graph into thesecond graph, and wherein said extraction unit is configured to extractan operator subsequence that appears at least the predetermined numberof times in the operator sequences corresponding to the plurality ofgraph sequences, based on the anti-monotonicity used in the Apriorialgorithm.
 2. The frequent changing pattern extraction device accordingto claim 1, wherein the operations indicated by the operators include atleast one of a vertex insertion, a vertex deletion, a vertex relabeling,an edge insertion, an edge deletion, and an edge relabeling.
 3. Thefrequent changing pattern extraction device according to claim 1,further comprising a sequence-for-union-graph generation unit configuredto generate an operator sequence corresponding to a union graph obtainedby removing a vertex that is not connected to another vertex from agraph configured by a union of vertices and a union of edges of theplurality of graphs included in the graph sequence, wherein saidextraction unit is configured to extract an operator subsequence thatappears at least a predetermined number of times in the operatorsequence generated by said sequence-for-union-graph generation unit,based on the anti-monotonicity used in the Apriori algorithm.
 4. Thefrequent changing pattern extraction device according to claim 1,further comprising an order changing unit configured to change an orderin which the operators included in the operator sequence converted bysaid conversion unit are arranged, so that the temporal changes in thegraphs expressed by a resulting operator sequence are represented byvertices that increase in number over time, wherein said extraction unitis configured to extract an operator subsequence that appears at least apredetermined number of times in the operator sequence obtained as aresult of the order change executed by said order changing unit, basedon the anti-monotonicity used in the Apriori algorithm.
 5. The frequentchanging pattern extraction device according to claim 1, wherein saidextraction unit includes: a subsequence candidate generation unitconfigured to generate operator subsequence candidates while increasingthe number of included operators by one each time; and an appearancefrequency calculation unit configured to calculate the number ofappearances, in the operator sequence, for each of the operatorsubsequence candidates, wherein said subsequence candidate generationunit is configured to increase the number of operators by one for onlyan operator subsequence candidate, out of the operator subsequencecandidates, whose number of appearances calculated by said appearancefrequency calculation unit is a predetermined number of times or more,so as to update the operator subsequence candidates.
 6. A frequentchanging pattern extraction method comprising: Converting, using aprocessor, a graph sequence into an operator sequence by expressingchanges, from a first graph included in the graph sequence to a secondgraph which is temporally adjacent to the first graph, using operatorsindicating operations necessary to transform the first graph into thesecond graph, the graph sequence including a plurality of graphs thatshow temporal changes in the graphs, and each of the graphs including avertex corresponding to a data piece and an edge corresponding to a linkbetween data pieces; and extracting an operator subsequence that appearsat least a predetermined number of times in the operator sequence, basedon anti-monotonicity used in the Apriori algorithm, wherein there are aplurality of graph sequences, said conversion unit is configured toconvert each of the plurality of graph sequences into an operatorsequence by expressing changes, from a first graph included in the graphsequence to a second graph which is temporally adjacent to the firstgraph, using operators indicating operations necessary to transform thefirst graph into the second graph, and wherein said extraction unit isconfigured to extract an operator subsequence that appears at least thepredetermined number of times in the operator sequences corresponding tothe plurality of graph sequences, based on the anti-monotonicity used inthe Apriori algorithm.
 7. A non-transitory computer readable recordingmedium having stored thereon a program, wherein, when executed, theprogram causes a computer to execute a method comprising: converting agraph sequence into an operator sequence by expressing changes, from afirst graph included in the graph sequence to a second graph which istemporally adjacent to the first graph, using operators indicatingoperations necessary to transform the first graph into the second graph,the graph sequence including a plurality of graphs that show temporalchanges in the graphs, and each of the graphs including a vertexcorresponding to a data piece and an edge corresponding to a linkbetween data pieces; and extracting an operator subsequence that appearsat least a predetermined number of times in the operator sequence, basedon anti-monotonicity used in the Apriori algorithm, wherein there are aplurality of graph sequences, wherein said converting converts each ofthe plurality of graph sequences into an operator sequence by expressingchanges, from a first graph included in the graph sequence to a secondgraph which is temporally adjacent to the first graph, using operatorsindicating operations necessary to transform the first graph into thesecond graph, and wherein said extracting extracts an operatorsubsequence that appears at least the predetermined number of times inthe operator sequences corresponding to the plurality of graphsequences, based on the anti-monotonicity used in the Apriori algorithm.